Porting IDL to Python¶
import numpy as np
import idlwrap
Introduction¶
With numpy
and scipy
, there are powerful and open-source tools available for scientific computing in python. Currently, still lots of scientific projects — especially in astrophysics — rely on the proprietary and expensive IDL programming language instead of moving foward to open and reproducible science. This guide aims to help in porting an IDL codebase to python, while taking full advantage of its powers.
For help with porting specific IDL functions and routines you are invited to look at the source code of idlwrap
, which has porting instructions in its docstrings.
reading this guide¶
This guide contains code examples in both IDL and python. IDL code blocks are prefixed with IDL>
, whereas python code starts with >>>
. Also, IDL functions and routines are represented in uppercase.
The floating-point value 0.1
is not stored as exactly 0.1
in memory, but rather as 3602879701896397 / 2 ** 55
, which is approximatively 0.1000000000000000055511151231257827021181583404541015625...
. These differences add together and lead to the unusual result.
rounding¶
In IDL, ROUND
uses round-half-away-from-zero, also known as commercial rounding. That's what you usually learn in school. It treats positive and negative values symmetrically: If positive and negative numbers are equally probable, this rounding is free of any bias.
IDL> PRINT, ROUND(-0.5), ROUND(0.5), ROUND(1.5), ROUND(2.5)
-1 1 2 3
python / numpy use half-to-even / financial rounding / mathematical rounding, which is the default rounding mode in the IEEE-754 standard. On machines, which represent floating-point numbers using binary approximation, this rounding is non-biased, whereas round half away from zero (like IDL's ROUND
), would be positively biased.
>>> round(-0.5), round(0.5), round(1.5), round(2.5)
(0, 0, 2, 2)
numpy's numpy.around
function and the ndarray.round
method round as python's built-in round
.
porting¶
In general, you don't have to bother which rounding method your program uses. But if you use ROUND
when e.g. determining list indices, this could cause differences. Use idlwrap.round
in that cases, which implements IDL's round-half-away-from-zero rounding.
Precision¶
Floating point numbers are stored internally with a fixed number of bits, or precision. The IEEE Standard for Binary Floating-Point for Arithmetic (IEEE-754) defines
- double precision. python default, used in
float
/np.float64
. IDLDOUBLE
. Contains 53bits of precision. - single precision. IDL default, called
FLOAT
. If you really really need to, usenp.float32
- half precision. listed for completeness. Corresponds to
np.float16
.
IDL often has multiple functions for the different data types, e.g. FINDGEN
(FLOAT
, 32 bit) and DINDGEN
(DOUBLE
, 64 bit), or !PI
(32 bit) and !DPI
(double, 54 bit), while most of numpy's functions accept a dtype=...
argument.
You usually do not need to think about bits in python, just use e.g. np.zeros(...)
for both FLTARR(...)
and DBLARR(...)
.
Note:
INTARR(...)
could be replaced bynp.zeros(..., dtype=int)
Arrays¶
memory order¶
general¶
There are two different ways of storing a matrix/array in memory:
- column-major. The matrix is stored by columns, so the first index is the most rapidly varying index when moving through the elements of the array
- the first index moves to the next row as it changes
- e.g. FORTRAN, IDL
- access element by
[column, row]
, upper-left element is[0,0]
- row-major. The first index is the row.
- last index changes most rapidly as one moves through the array as stored in memory
- e.g. C, Visual Basic, python
- access element by
[row, column]
further reading:
- numpy doc on array indexing order
- IDL article which talks about array order (see point #5)
Example 1¶
Let's look at an example:
IDL> PRINT, FLTARR(2, 4) ; 2 columns
0.00000 0.00000
0.00000 0.00000
0.00000 0.00000
0.00000 0.00000
>>> np.zeros((2,4)) # 4 columns
array([[0., 0., 0., 0.],
[0., 0., 0., 0.]])
In IDL, the first diemsion is the number of columns, the second the number of rows. You index them the same way, [column, row]
--- to get the bottom right element:
IDL> PRINT, (FLTARR(2, 4))[1,3]
0.00000
In Python, the first dimension is the number of rows. Indexing works like [row, column]
, so the bottom right element is
>>> np.zeros((2,4))[1,3]
0.0
Did you notice how the subset-indices are the same for both IDL and python in this case, even if we chose a different element?
Example 2¶
IDL> a = [[1,2,3,4], [5,6,7,8]]
IDL> a
1 2 3 4
5 6 7 8
IDL> SIZE(a)
2 4 2 2 8
; n_dimensions, rows, columns, ...
IDL> a[3, 0]
4
>>> a = np.array([[1,2,3,4], [5,6,7,8]])
>>> a
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
>>> a.shape
(2, 4) # (rows, columns)
>>> a[0, 3] # inverse order compared to IDL!
4
array index ranges¶
In IDL, the index ranges are inclusive (they include the endpoint):
IDL> (FLTARR(10))[3:5]
0.00000 0.00000 0.00000 ; -> three elements
While in python, the endpoint is not included:
>>> np.zeros(10)[3:5]
array([0., 0.]) # -> two elements
This is also the case for the FOR
statement.
idlwrap provides two ways around this. The first one would be to use the
subset_
function:>>> a = np.zeros(10) >>> idlwrap.subset_(a, "[3:5]") array([0., 0., 0.])The second way would be to wrap the array inside
subsetify_
. The resulting object (b
) is like a numpy array, but behaves differently when a string is passed as subset:>>> a = np.zeros(10) >>> b = idlwrap.subsetify_(a) # b is like a numpy array... >>> b[3:5] # python behaviour array([0., 0.]) >>> b["3:5"] # IDL behaviour: pass indices as string array([0., 0., 0.])
float indices¶
IDL automatically floors array indices, so a[1]
and a[1.9]
lead to the same result:
IDL> a = INDGEN(3)
IDL> a
0 1 2
IDL> a[1]
1
IDL> a[1.9]
1
In python, you'll have to int
indices, or numpy
with throw an IndexError
.
FOR
statement¶
In IDL, the endpoint of the FOR
statement is also included in the loop, while python's range
excludes the endpoint.
Example 1: integer ranges¶
IDL> FOR i=4, 6 DO PRINT, i
4
5
6 ; -> 3 elements
>>> for i in range(4, 6):
>>> print(i)
4
5 # 2 elements
A common way of dealing with the endpoint in python is to explicitely increment it:
>>> for i in range(4, 6+1):
>>> print(i)
4
5
6
Example 2: float ranges¶
IDL> FOR i=3.5, 4.5 DO PRINT, i
3.50000
4.50000
While python's built-in range
only supports integer arguments, numpy's arange
also allows floats:
>>> for i in np.arange(3.5, 4.5+1):
>>> print(i)
3.5
4.5
Example 3: endpoint not reached¶
IDL> FOR i=3.5, 5 DO PRINT, i
3.50000
4.50000
Adding an explicit +1
to range
/np.arange
would add another unwanted element to the iteration:
>>> for i in np.arange(3.5, 5+1):
>>> print(i)
3.5
4.5
5.5
An alternative approach would be to add a very small offset, e.g. 1e-12
to the endpoint, which leads to the expected result:
>>> for i in np.arange(3.5, 5+1e-12):
>>> print(i)
3.5
4.5
idlwrap's
idlwrap.range_
uses1e-12
as an offset.
Example 4: float ranges and array indices¶
IDL automatically transforms array indices to integers, so this is perfectly valid:
IDL> a = INDGEN(6)
IDL> for i=0.0, 5, 0.7 DO print, i, a[i]
0.00000 0
0.700000 0
1.40000 1
2.10000 2
2.80000 2
3.50000 3
4.20000 4
4.90000 4
In python, you'll have to int
the indices explicitely: a[int(i)]
.
warning: the following code:
FOR i=0, 5, 0.7 DO print, a[i]would lead to an infinite loop printing
0
! The difference is thei=0
(integer type) instead ofi=0.0
(float).
Matrix multiplication¶
IDL provides two matrix multiplication operators, #
and ##
:
IDL> a = indgen(2, 3)
IDL> a
0 1
2 3
4 5
IDL> b = indgen(3, 2)
IDL> b
0 1 2
3 4 5
IDL> a # b
10 13
28 40
IDL> a ## b
3 4 5
9 14 19
15 24 33
>>> a = np.arange(2*3).reshape((3, 2))
>>> a
array([[0, 1],
[2, 3],
[4, 5]])
>>> b = np.arange(3*2).reshape((2, 3))
>>> b
array([[0, 1, 2],
[3, 4, 5]])
python 3.5+ has a new matrix multiplication operator @
, which behaves like IDL's ##
:
>>> a @ b
array([[ 3, 4, 5],
[ 9, 14, 19],
[15, 24, 33]])
@
is an alias for np.matmul
, the latter also being available in older python/numpy
versions.
To replicate the #
operator, one would have to use .T
to transpose the input and output:
>>> (a.T @ b.T).T
array([[10, 13],
[28, 40]])